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12 Abstract 

13 1. Species distribution models (SDMs) can be used to predict how individual species — and 
w whole assemblages of species — will respond to a changing environment. Until now, these 

15 models have either assumed (1) that species' occurrence probabilities are uncorrelated, 

16 or (2) that species respond linearly to preselected environmental variables. These two 

17 assumptions currently prevent ecologists from modeling assemblages with realistic co- 
is occurrence and species richness properties. 

19 2. This paper introduces a stochastic feedforward neural network, called mistnet, which makes 

20 neither assumption. Thus, unlike most SDMs, mistnet can account for non-independent 

21 co-occurrence patterns driven by unobserved environmental heterogeneity. And unlike 

22 recently proposed Joint SDMs, mistnet can also learn nonlinear functions relating species' 

23 occurrence probabilities to environmental predictors. 

24 3. Mistnet makes more accurate predictions about the North American bird communities 

25 found along Breeding Bird Survey transects than several alternative methods tested. In 

26 particular, typical assemblages held out of sample for validation were nearly 50,000 times 

27 more likely under the mistnet model than under independent combinations of single-species 

28 models. 

29 4. Apart from improved accuracy, mistnet shows two other important benefits for ecological 

30 research and management. First: by analyzing co-occurrence data, mistnet can identify 

31 unmeasured — and perhaps unanticipated — environmental variables that drive species 

32 turnover. For example, mistnet identified a strong grassland/forest gradient, even though 

33 only temperature and precipitation were given as model inputs. Second: mistnet is able 
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34 to take advantage of incomplete data sets to guide its predictions towards more realistic 

35 assemblages. For example, mistnet automatically adjusts its expectations to include more 

36 forest-associated species in response to a stray observation of a forest-dwelling warbler. 
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37 Introduction 

38 Programs for managing and understanding biodiversity each require information about where 

39 species occur and where they could occur. Statistical approaches to these questions, such as 

40 species distribution models (SDMs), are important because they can help us anticipate how 

41 beneficial species might fare — or how harmful species might spread — in scenarios that we 

42 cannot observe directly (Elith & Leathwick 2009). Modern SDMs need not assume that species 

43 respond to environmental variation in a pre-specified way (e.g. linearly or quadratically) ; 

44 relaxing this assumption has substantially improved our ability to make predictions about 

45 where species can occur (Elith et al. 2006). 

46 Unfortunately, existing nonlinear approaches do not always answer the most pressing questions 

47 for ecologists. Ecologists are not only interested in individual species; we are also interested 

48 in learning about higher-level patterns, such as community structure, species richness, species 

49 turnover, and alternative stable states (Chase 2003). While SDMs are often combined 

50 ("stacked") to generate assemblage-level predictions (Pellissier et al. 2013), doing so requires 

51 assuming that species' occurrence probabilities are uncorrelated (Clark et al. 2013; Calabrese 

52 et al. 2014). As shown in more detail below, ignoring these correlations leads stacked 

53 models to predict incoherent jumbles of species rather than realistic assemblages (Clark et al. 

54 2013). A major source of non- independence among species — which stacked SDMs ignore — is 

55 shared dependence on unobserved environmental factors (Mclnerny & Purves 2011; Figure 

56 1; Calabrese et al. 2014). Given that most models only use climate variables as predictors 

57 (Austin & Van Niel 2011), the set of unobserved factors will usually include all of ecology 

58 apart from climatic influences. SDMs' failure to model other ecological processes is thus 

59 widely considered to be a major omission from statistical ecology's toolbox (Austin & Van 
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60 Niel 2011; Guisan & Rahbek 2011; Kissling et al. 2012; Wisz et al. 2013; Clark et al. 2013). 
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Figure 1: Unobserved environmental heterogeneity can induce correlations between species; 
ignoring this heterogeneity can produce misleading results. A: Based on climate predictors, 
a pair of single-species models might predict 50% occurrence probabilities for each of two 
wetland species (black cross). Climate predictors are not sufficient in this case, however: a 
site's suitability for these species cannot really be determined without information about the 
availability of wetland habitat. Real habitats will to be tend to be suitable for both species 
(dense cloud of points in upper-right corner) or neither (lower-left corner), depending on 
this unmeasured variable. B This correlation among species substantially alters the set of 
assemblages one would expect to observe. (Under independence, all four possibilities would be 
equally probable.) C Positive correlations among species can even induce a strongly bimodal 
distribution of species richness values. 



ei In the last few years, several mixed models have been proposed to help explain the co- 

62 occurrence patterns that stacked SDMs ignore (Latimer et al. 2009; Ovaskainen, Hottola 

63 & Siitonen 2010; Golding 2013; Clark et al. 2013; Pollock et al. 2014). These joint species 

64 distribution models (JSDMs) can produce mixtures of possible species assemblages (points 

65 in Figure la), rather than relying on a small number of environmental measurements to 
ee fully describe each species' probability of occurrence (which would collapse the distribution 

67 in Figure la to a single point; Pollock et al. 2014). In JSDMs (as in nature), a given set 

68 of temperature and precipitation measurements could be consistent with a number of very 

69 different possible sets of co-occurring species, depending on factors that ecologists have not 
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70 necessarily measured or even identified as important. JSDMs represent these unobserved 

71 (latent) factors as random variables whose true values are unknown but whose existence 

72 would still help explain discrepancies between the data and the stacked SDMs' predictions 

73 (Figures lb and lc). While JSDMs represent a major advance in community-level modeling 

74 (Clark et al. 2013; Pollock et al. 2014), existing implementations have all assumed that 

75 species' responses to the environment are linear (in the sense of a generalized linear model). 

76 Thus, these JSDMs sacrifice the flexibility of modern single-species models, reducing their 

77 accuracy and limiting their utility. 

78 Here, I present a new R package for assemblage-level modeling — called mistnet — that does not 

79 rely on independence (as stacks of single-species models do) or linearity (as previous JSDMs 
so do). Mistnet is a stochastic feed-forward neural network (Neal 1992; Tang & Salakhutdinov 
si 2013) that combines the nonlinear flexibility of modern single-species models with the latent 

82 variables found in previous JSDMs (cf Hutchinson, Liu & Dietterich 2011). In order to 

83 demonstrate the value of this approach, I compared mistnet's predictive likelihood with 

84 that of several existing models, using observational data from thousands of North American 

85 Breeding Bird Survey transects (BBS; Sauer et al. 2011). A high predictive likelihood 

86 indicates that the model expects to see assemblages like those found along transects held 

87 out-of-sample, while a very low likelihood means that the model has effectively ruled those 

88 assemblages out due to overfitting or underfitting. 

89 An accurate JSDM would up new possibilities for research and effective management. For 

90 example, although most models only have access to climate data (Austin & Van Niel 2011), 

91 a successful model of community structure should also be able to identify the major axes of 

92 non-climate variation that drive species turnover based on the species' observed co-occurrence 
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93 patterns. Moreover, a successful assemblage-level model would be able to take advantage of 

94 partially-completed samples or other kinds of prior information about a few species to inform 

95 its predictions about the rest of the assemblage. Since data collection efforts are frequently 

96 asymmetrical or incomplete, the ability to transfer information from well-documented taxa to 

97 more cryptic or rare species would prove valuable for community ecologists and conservationists 

98 alike. While a model's ability to infer, for example, that "waterbirds like water" would not 

99 provide any novel biological insights, it would demonstrate that a modeling framework is 

100 ready to tackle more difficult problems where the biology is not already known. 

101 Materials and Methods Methods 

102 Methods are presented in four main sections: (1) an introduction to the data sets used in 

103 this analysis, (2) a description of mistnet, (3) a summary of the existing methods used for 

104 model comparison, and (4) criteria for model evaluation. 

105 Data 

we Field survey data was obtained from the 2011 Breeding Bird Survey (BBS; Sauer et al. 2011). 

107 The BBS data consists of thousands of transects ("routes"), which I used as the main unit 

los for my analysis. Each route includes 50 stops, about 0.8 km apart. At each stop, all the 

109 birds observed in a 3-minute period are recorded, using a standardized procedure. Following 

no BBS recommendations, I omitted nonstandard routes and data collected on days with bad 

in weather. 

112 In order to evaluate SDMs' capacities for predicting species composition, I split the routes 
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113 into a "training" data set consisting of 1559 routes and a "test" data set consisting of 280 

lw routes (Figure 2; Appendix A). The two data sets were separated by a 150-km buffer to 

115 ensure that models could not rely on spatial autocorrelation to make accurate predictions 

lie about the test set (Bahn & McGill 2007) (Appendix A). Each model was fit to the same 

in training set, and then its performance was evaluated out-of-sample on the test set. 




Figure 2: Map of the BBS routes used in this analysis. Black points are training routes; red 
ones are test routes. The training and test routes are separated by a 150-km buffer in order 
to minimize spatial autocorrelation across the two partitions. 

us Observational data for each species was reduced to "presence" or "absence" at the route level, 
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119 ignoring the possibility of observation error for the reasons outlined in (Welsh, Lindenmayer 

120 & Donnelly 2013). It would be possible to incorporate the possibility of such errors in the 

121 model-fitting procedure if appropriate data were available, as was done in (Hutchinson et al. 

122 2011). 368 species were chosen for analysis according to a procedure described in Appendix 

123 A. 

124 To obtain environmental predictors for the model, I extracted the 18 Bioclim climate variables 

125 for each route from Worldclim (version 1.4; Hijmans et al. 2005). I omitted variables that 

126 were nearly collinear with one another (i.e. |r| >0.8) using the f indCorrelation function in 

127 the caret package (Wing et al. 2013), leaving eight climate-based predictors (Appendix A). 

128 Since most SDMs do not use land cover data (Austin &; Van Niel 2011) and one of mistnet's 

129 goals is to make inferences about unobserved environmental variation, no other variables 

130 were included in this analysis. 

131 Finally, I obtained habitat classifications for each species from the Cornell Lab of Ornithology's 

132 All About Birds website (www.allaboutbirds.org) using an R script written by K. E. Dybala. 

133 Introduction to stochastic neural networks 

134 Neural networks describe nonlinear mappings from input variables to predictions about one 

135 or more output variables. In general, ecologists have not had much success using neural 

136 networks for SDM, compared with other methods (e.g. Dormann et al. 2008). However, 

137 modern neural networks have recently outperformed other machine learning techniques in a 
us wide range of applied contexts (Bengio 2013) and are thus worth a second look. 

139 Mistnet models are stochastic neural networks, meaning that they include latent random 

wo variables (Neal 1992; Tang & Salakhutdinov 2013). In such a model, species' occurrence 
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mi probabilities are not fully specified the variables ecologists happen to measure, but can also 

142 depend on factors that have not been observed. In the absence of any information about these 

143 variables, mistnet (like other JSDMs) represents them using standard normal distributions. 

144 Depending on which values are sampled from these normal distributions and fed through the 

145 neural network, the model will expect to see different kinds of species assemblages (Figure 3). 

we While the model's main function is to make predictions about the species found in a given 

147 environment, inference can also proceed backward through the network, so that the presence 

148 (or absence) of a particular species can provide indications about the local environment — and 

149 thus about the likely configuration of the rest of the assemblage. This kind of inference could 

150 be useful in a variety of important contexts. For example, data is often more plentiful about 

151 waterfowl than about other wetland species, due to interest from hunters and conservation 

152 groups. If waterfowl are known to be present along a route, then a JSDM should recognize 

153 that suitable habitat was available, automatically increasing the estimated probability of 

154 occurrence for other species known to have similar habitat requirements. Notably, none of 

155 this extra inferential power requires that the mistnet user understand which environmental 

156 factors are driving the correlations between species, since these correlations are automatically 

157 inferred from species' co-occurrence patterns. 

158 The neural network used here (illustrated in Figure 3b) is trained to find a way of representing 

159 different environmental conditions such that each species' response to the environment can 
wo be described using a small number of coefficients (e.g. 15 in this analysis; Appendix B). The 
lei small number of coefficients and the uniformity of their functions makes mistnet models highly 

162 interpretable: the coefficients linking the second hidden layer to a given species' probability of 

163 occurrence essentially describe that species' responses to a few leading principal components 
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Figure 3: A A generalized diagram for stochastic feed- forward neural networks that transform 
environmental variables into occurrence probabilities multiple species. The network's hidden 
layers perform a nonlinear transformation of the observed and unobserved ("latent") environ- 
mental variables; each species' occurrence probability then depends on the state of the final 
hidden layer. B The specific network used in this paper, with two hidden layers. The inputs 
include Worldclim variables involving temperature and precipitation, as well as random draws 
from each of the latent environmental factors. These inputs are multiplied by a coefficient 
matrix and then nonlinearly transformed in the first hidden layer. The second hidden layer 
uses a different coefficient matrix to linearly transform its inputs down to a smaller number 
of variables (like Principal Components Analysis of the previous layer's activations). A third 
matrix of coefficients links each species' occurrence probability to each of the variables in 
this linear summary (like one instance of logistic regression for each species). The coefficients 
are all learned using a variant of the backpropagation algorithm. 
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164 of environmental variation (cf Vincent et al. (2010)). For comparison, the boosted regression 

165 tree SDMs used below (Elith, Leathwick & Hastie 2008) have tens of thousands of coefficients 
lee per species, with entirely new interpretations for each new species' coefficients. 

167 How do we train the model to make good predictions? As with most neural networks, 

168 mistnet's coefficients are initialized randomly, and then the model climbs the log-likelihood 

169 surface by iteratively adjusting the coefficients toward better values. In mistnet models, the 
no adjustments are calculated with a variant of the backpropagation algorithm (Rumelhart, 

171 Hinton & Williams 1986; Murphy 2012) suggested by Tang & Salakhutdinov (2013) for 

172 stochastic neural networks. The fitting procedure alternates between inferring the states of 

173 the latent variables (via importance sampling) and updating the model's coefficients (via 

174 backpropagation). Both phases of model fitting are described in more detail in Appendix 

175 B. Despite importance sampling's imprecision, this generalized expectation maximization 

176 procedure will converge to a local optimum on the likelihood surface with probability one 

177 (Neal & Hinton 1998; Tang & Salakhutdinov 2013), ensuring that the expected likelihood 

178 is high after averaging over the possible random samples. Following best practices (Orr & 

179 Miiller 1998; Murphy 2012), mistnet constrains the coefficients using Li regularization to 
wo prevent overfitting; the strength of this "weight decay" term was chosen by cross-validation, 
lei as described in the Appendix. 

182 The mistnet source code can be viewed and downloaded from https: / /github.com/davharris/mistnet. 

183 While the user interface and most of the algorithms are written in R, a small portion of 

184 the code is written in C++, using Repp (Eddelbuettel & Francois 2011) to manage the 

185 interface between languages and RcppArmadillo (Eddelbuettel & Sanderson 2014) to access 

186 the Armadillo linear algebra library for faster matrix manipulations (Sanderson 2010). 
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is? Existing models used for comparison 

las I compared mistnet's predictive performance with two machine learning techniques and with 

189 a linear JSDM called BayesComm (Golding 2013; Golding & Harris 2014). Each of these 

wo techniques is described briefly below; implementational details and settings for each method 

191 can be found in the Appendix. 

192 The first machine learning method I used for comparison, boosted regression trees (BRT; 

193 Elith et al. 2008), is among the most powerful techniques available for single-species SDM 

194 (Elith et al. 2006; Elith et al. 2008). I trained one BRT model for each species using R's gbm 

195 package (Ridgeway 2013) and stacked them following the recommendations in (Calabrese et 
we al. 2014). 

197 I also used a neural network model with no stochastic latent variables as a baseline against 

198 which to compare mistnet. Such neural networks do share some information among species 

199 (i.e. all species' log-odds of occurrence are linear combinations of the same hidden layer), but 

200 like most other multi-species SDMs (De'ath 2002; Leathwick et al. 2005; Ferrier et al. 2007) 

201 they are not JSDMs and do not explicitly model co-occurrence (Clark et al. 2013). The 

202 neural net baseline was trained using the nnet package (Venables & Ripley 2002). 

203 Finally, I trained a BayesComm model (Golding 2013; Golding & Harris 2014) to evaluate 

204 the importance of mistnet's nonlinearities compared to a linear alternative that also models 

205 co-occurrence explicitly. 

206 To ensure a level playing field, each modeling approach was given about 15 hours on the same 

207 computer for cross-validation and to make its predictions, as described in the Appendix. 
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208 Evaluating model predictions along test routes 

209 I evaluated mistnet's predictions both qualitatively and quantitatively. Qualitative assess- 

210 ments involved looking for patterns in the model's predictions and comparing them with 

211 ornithological knowledge (e.g. the habitat classifications provided by the Cornell Lab of 

212 Ornithology) . 

213 Each model was evaluated quantitatively on the test routes (red points in Figure 2) to 

214 assess its predictive accuracy out-of-sample. Models were scored according to their predictive 

215 likelihoods, i.e. the probabilities they assigned to various scenarios observed in the test 

216 data. Models with high likelihoods expect realistic co-occurrence patterns, and should yield 

217 more biologically relevant insights about the processes underlying those patterns. Models 

218 that overfit or underfit will have lower out-of-sample likelihoods, and should be trusted less 

219 to provide these kinds of insights. I tested each model's ability to make several kinds of 

220 predictions, ranging from estimates of the probability of observing particular species at a given 

221 location, to predictions about the species richness and composition of entire assemblages. 

222 To quantify the difficulties each model faced as it made predictions about increasingly large 

223 assemblages, I estimated their route-level predictive likelihoods for randomly-chosen groups 

224 of species, ranging in size from individual species pairs to the full set of 368 species in 

225 the data set. Models that assumed species were uncorrelated should see an exponential 

226 decay in their likelihoods as the number of species increases (since the probability of making 

227 correct predictions for a set of uncorrelated species equals the product of their individual 

228 probabilities), while BayesComm and mistnet should be able to take advantage of correlations 

229 to simplify problem of making predictions for the larger assemblages. 

230 Finally, each model predicted a range of possible species richness values for each test route; 
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231 I calculated quantiles for each model's predictions using the Poisson-binomial distribution 

232 (Hong 2013), as recommended in Calabrese et al. (2014). 

233 Results and Discussion 

234 Mistnet's view of North American bird assemblages 

235 I began by decomposing the variance in the mistnet's species-level predictions among-routes 

236 (which varied in their climate values) and residual variation within routes. On average, 

237 the residuals accounted for 29% of the variance in mistnet's predictions, indicating that 

238 non-climate factors play a substantial role in habitat filtering at continental scales. 

239 If the non-climate factors mistnet identified were biologically meaningful, then there should be 

240 a strong correspondence between the 15 coefficients assigned to each species by mistnet and 

241 the habitat classifications assigned by the Cornell lab of Ornithology. A linear discriminant 

242 analysis (LDA; Venables & Ripley 2002) demonstrated such a correspondence (Figure 4). The 

243 two-dimensional subspace in Figure 4 explains 19% of the total variance in species' coefficients 

244 (representing an even greater portion of the non-climate variance). Mistnet's coefficients 

245 cleanly distinguished several groups of species by habitat association (e.g. "Grassland" species 

246 versus "Forest" species), though the model largely failed to distinguish "Marsh" species from 
24? "Lake/Pond" species and "Scrub" species from "Open Woodland" species. These results 

248 indicate that the model has identified the broad differences among communities, but that it 

249 lacks some fine-scale resolution for distinguishing among types of wetlands and among types 

250 of partially-wooded areas. Alternatively, perhaps these finer distinctions are not as salient at 

251 the scale of a 40-km transect. 
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Figure 4: Each species' mistnet coefficients have been projected into a two-dimensional space 
by linear discriminant analysis (LDA) in order to maximize the spread between the six habitat 
types assigned to species by the Cornell Lab of Ornithology's All About Birds website. The 
figure shows that mistnet cleanly separates "Grassland" species from "Forest" species, with 
"Scrub" and "Open Woodland" species representing intermediates along this axis of variation. 
"Marsh" and "Lake/Pond" species cluster together in the upper-left. The other habitat classes 
were included in the LDA, but are not shown here. 
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252 Figure 5A shows how the forest/grassland gradient identified by mistnet affects the model's 

253 predictions for a pair of species with opposite responses to forest cover. The model cannot 

254 tell which of these two species will be observed (since it was only provided with climate data) , 

255 but the model has learned enough about these two species to tell that the probability of 

256 observing both along the same 40-km transect is much lower than would be expected if the 

257 species were uncorrelated. 

258 Figure 5A reflects a great deal of uncertainty, which is appropriate considering that the model 

259 has no information about a crucial environmental variable (forest cover). Often, however, 

260 additional information is available that could help resolve this uncertainty, and the mistnet 

261 package includes a built-in way to do so, as indicated in Figures 5B and 5C. These panels 

262 show how the model is able to use an observation of a forest-associated Nashville Warbler 

263 (Oreothlypis ruficapilla) to indicate that a whole suite of other forest-dwelling species are 

264 likely to occur nearby, and that a variety of species that prefer open fields and wetlands 

265 should be absent. Similarly, Figure 5D shows how the presence of a Redhead duck (Aythya 

266 americana) can inform the model that a route is suitable habitat for a variety of other ducks, 
26? as well as for other wetland-associated species such as marsh-breeding blackbirds, sandpipers, 

268 and rails (along with a few other species that do not fit this theme as nicely). None of these 

269 inferences would be possible from a stack of disconnected single-species SDMs. 

270 Model comparison: species richness 

271 Environmental heterogeneity plays an especially important role in determining species richness, 

272 which is often overdispersed relative to models' expectations (O'Hara 2005). Figure 6 shows 

273 that mistnet's predictions respect the heterogeneity one might find in nature: areas with 
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Figure 5: A. The mistnet model has learned that Ruby-crowned Kinglets (Regulus calendula) 
and Horned Larks (Eremophila alpestris) have opposite responses to some environmental 
factor whose true value is unknown. Based on these two species' biology, an ornithologist could 
infer that this unobserved variable is related to forest cover, with the Kinglet favoring more 
forested areas and the Lark favoring more open areas. B. The presence of a forest-dwelling 
Nashville Warbler (Oreothlypis ruficapilla) provides the model with a strong indication that 
the area is forested, increasing the weight assigned to Monte Carlo samples that are suitable 
for the Kinglet and decreasing the weight assigned to samples that are suitable for the lark. 
C. The Nashville Warbler's presence similarly suggests increased occurrence probabilities for 
a variety of other forest species, as well as decreased probabilities for species associated with 
wetlands and grasslands. D. If a Redhead (Aythya americana) has been observed along a 
route, the model correctly expects to see more ducks, rails and sandpipers in the same area. 
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274 a given climate could be largely unsuitable for waterfowl (Anatid richness < 2 species) or 

275 marshy and open (Anatid richness > 10 species). Under the independence assumption used 

276 for stacking SDMs, however, both of these very plausible scenarios would be ruled out (Figure 

277 6A). 
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Figure 6: The predicted distribution of species richness one would expect to find based on 
predictions from mistnet and the baseline neural network. A. Anatid species (waterfowl). B. 
All bird species. BRT's predictions (not shown) are similar to the baseline network, since 
neither one accounts for the effects of unmeasured environmental heterogeneity. 



278 Unfortunately, stacking leads to even larger errors when predicting richness for larger groups, 
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279 such as the complete set of birds studied here. Models that stacked independent predictions 

280 underestimated the range of biologically possible outcomes (Figure 6B), frequently putting 

281 million-to-one or even billion-to-one odds against species richness values that were actually 

282 observed. In more concrete terms, half of the observed species richness values fell outside 

283 these models' 95% confidence intervals. The overconfidence associated with stacked models 

284 could have serious consequences in both management and research contexts if we fail to 

285 prepare for species richness values outside such an unreasonably narrow range. 

286 Mistnet, on the other hand, was able to explore the range of possible non-climate environments 

287 to avoid these missteps: 90% of the test routes fell within mistnet's 95% confidence intervals, 

288 and the log-likelihood ratio decisively favored it over stacked alternatives. 

289 Model comparison: single species 

290 The two neural network models had the best performance at the level of individual species 

291 (Table 1). The neural networks' advantage over BRT was largest for low-prevalence species 

292 (linear regression of log-likelihood ratio versus log-prevalence; p = 0.004). This is consistent 

293 with previous observations that multi-species models can outperform single-species approaches 

294 for rare species (Leathwick, Elith & Hastie 2006), which will often be of the greatest 

295 conservation concern. BayesComm's predictions were substantially worse than any of the 

296 machine learning methods, which I attribute to its inability to learn nonlinear responses to 

297 the environment. 
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method 



expected. log.likelihood likelihood. ratio 



nnet 

mistnet 

BRT 



BayesComm 



-48.7 21.3 

-48.7 21 

-51.7 1 

-56.6 0.00771 



Table 1: Expected species- level log-likelihood for each method, summed over all test routes and 
averaged across all species. The likelihood ratio compares each model to BRT, representing 
single-species SDMs. Sharing information among species with either of the neural net models 
improves the predictive likelihood more than twenty-fold for a typical species compared to 
BRT. Note also that BayesComm averages less than 1% of the machine learning methods' 
likelihoods because of its linearity assumption. 

298 Model comparison: community composition 

299 While making predictions about individual species observations is fairly straightforward 

300 with this data set (since most species have relatively narrow breeding ranges), community 

301 ecology is more concerned with co-occurrence and related patterns involving community 

302 composition (Chase 2003). As expected, models that combined their single-species predictions 

303 independently (including the neural network baseline) showed exponential decay in their 

304 likelihoods as the number of species per prediction increased. The JSDMs (mistnet and 

305 BayesComm) showed sub-exponential declines, since correlations reduce the number of 

306 independent bits of information needed to make an accurate prediction. As a result, mistnet 

307 became increasingly advantageous over independent combinations of single-species predictions 

308 as the assemblage size increased (Figure 7). Mistnet's log-likelihood averaged 10.8 units higher 

309 than BRT's for full assemblages of 368 species, corresponding to a 47000-fold improvement 

310 in likelihood for a typical transect in the test set. Mistnet's ability to focus its predictions 
3n on plausible combinations of species indicates that it has captured a great deal more of 

312 the underlying ecological processes than existing SDM approaches. While some of this 

313 improvement can be attributed to mistnet's overall tendency to make better predictions about 
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314 individual species (Table 1), the difference is mainly due to mistnet's ability to keep ahead of 

315 the combinatorial explosion of possible assemblages by exploiting correlations among species. 
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Figure 7: The likelihood ratio favoring mistnet over BRT grows super-exponentially with 
assemblage size. Each circle corresponds to a randomly- generated set of N species, where the 
value of N is indicated along the horizontal axis. Note the log scale on both axes. 



316 Comparison with BayesComm 

317 BayesComm's ability to make out-of-sample predictions was severely limited by its assumption 

318 that species respond linearly to climate variables, highlighting the the need for nonlinear 

319 methods that can learn the functional forms of species' responses to the environment. Adding 

320 quadratic and interaction terms would have led to severe overfitting for many rare species, 

321 and may still not have provided enough flexibility to compete with nonlinear techniques. 

322 Even without the added complexity of nonlinear terms, the BayesComm model required 

323 70,000 parameters, most of which served to to identify a distinct correlation coefficient 

324 between a single pair of species. Tracing this many parameters through hundreds of Markov 

325 chain iterations routinely caused BayesComm to exceed my machine's 8 gigabytes of memory 

326 and crash, even after the code was modified to reduce its memory footprint. Storing long 

327 Markov chains over a dense, full-rank covariance matrix (as has apparently been done in all 
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328 other JSDMs to date) thus appears not to be a feasible strategy with large assemblages. 

329 Conclusion 

330 These results show conclusively that both linearity and independence are unwarranted 

331 assumptions; either assumption can substantially impair our ability to model and understand 

332 large assemblages. Linear JSDMs are not flexible enough, and models without latent random 

333 variables cannot match the properties of real assemblages. 

334 SDMs' failure to sufficiently consider correlations among has kept these models from explaining 

335 and anticipating the full range of complex assemblages found in nature (Austin & Van Niel 

336 2011). Mistnet's predictions are much more compatible with these sorts of complexities. In 

337 particular, the model's predictions need not be unimodal, allowing the model to express 

338 conditional predictions, such as that "the probability of observing a Redhead duck will be very 

339 high if other wetland species are present, but very low otherwise." Such conditional predictions 

340 are important because the available data will not always contain enough information to 

341 narrow the possibilities down to a single assemblage type or a single group of species. In 

342 such situations, stacked models will provide a false sense of security out-of-sample, leading 

343 to bad decisionmaking and biased estimates of nature's variability. Mistnet provides better 

344 confidence intervals that are much more likely to actually contain the observed values when 

345 we look out-of-sample. 

346 Mistnet can also identify some of the same similarities among species that a skilled biologist 

347 would expect to find, which will be important for studying taxa that are more diverse and 

348 harder to observe (such as microbes). For taxa on the frontier of our knowledge, a model 
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349 like mistnet could help guide the biologists to ask the best questions and organize their 

350 understanding by suggesting which species have similar habitat requirements — even when 

351 the factor controlling their occurrence are still unknown. 

352 Unlike with stacked methods, one can read this straight out of mistnet's coefficient tables 

353 with no more difficulty than interpreting a Principal Components Analysis. 

354 Mistnet's ability to use asymmetrical or low-quality data sources to improve its predictions 

355 should inrease the value of low-effort data collection procedures such as short transects — 

356 especially since these improvements can be incorporated without need for fitting a new model. 

357 Future research should look for ways to use other forms of ecological knowledge about species 

358 to impose some structure on models coefficients and nudge the models toward more biologically 

359 reasonable predictions (Kearney & Porter 2009; Kissling et al. 2012). Such a research program 

360 could also be useful in other areas of predictive ecology [@pearse predicting 2013]. 

361 Finally, it should be noted that, while one could describe direct interactions among species 

362 using latent variables (Ovaskainen et al. 2010; Golding 2013), existing JSDMs are not 

363 particularly well-suited for learning about species interactions. Other models, such as Markov 

364 random fields (Azaele et al. 2010), or ensembles of classifier chains (Yu et al. 2011) would 

365 be much more appropriate for inferring coefficients related to species interactions, as they 

366 include direct dependencies among species. Latent variable-based JSDMs, including mistnet, 
36? are more appropriate for studies like this one at large spatial scales where direct species 

368 interactions will tend to be weaker and most of the variation is driven by environmental 

369 filtering and species' range limits. 

370 In conclusion, mistnet's accuracy, as well as its flexibility to work with opportunistic samples 

371 should make it useful for a variety of basic and applied contexts. Assemblage-level models, 
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372 such as mistnet, also have the potential to yield new biological insights. With charismatic and 

373 well-studied species like North American birds, most models will mainly be telling information 

374 that we already know. Still, mistnet's ability to capture useful information about axes of 

375 variation among birds and to match preconceptions about which species co-occur due to 

376 habitat variables may indicate that the model can teach us new things about taxa that are 

377 harder to study. 
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